Journal of Computational Chemistry — Latest Matching Preprints

1

Glycine molecule radical: Predicted properties and dipeptide formation

Synak, J.; Blazewicz, J.

2026-07-10 bioinformatics 10.64898/2026.07.07.736934 medRxiv

Top 0.1%

6.8%

Show abstract

Numerous advances in quantum and computational chemistry over the last decades, well as the development of computer science, allowed utilisation of more precise and complex models, which can be now applied to much bigger systems than in the past. The authors used Gaussian, coupled with theoretical methods, to predict a new way of peptide bond formation, which could have taken place in prebiotic conditions. To better tackle this difficult task, the properties of substrates (glycine-derived radicals) were extensively analysed, using the aforementioned tool - Gaussian, paired with taking resonance and hybridisation into account, to better understand the stereochemistry and the very nature of processes taking place. The result is a series of reactions, which without any sophisticated catalysts and with relatively low energy thresholds ({inverted exclamation}20 kcal/mol) can lead to formation of dipeptides (and further, oligopeptides). The authors also hope, the other predicted properties of the investigated molecules can be of use to any researcher, who would like to utilise them in their experiments. Author summaryOur goal was to investigate a way first peptide bonds in prebiotic conditions could have been formed. This is an extremely important step in research into the beginning of life on Earth. We found a very promising series of reactions, which uses atomic hydrogen as its only catalyst and confirmed our expectations with theoretical calculations, using Gaussian. There are two radicals derived from glycine, which perform major roles in the process, so we investigated their properties with Gaussian and verified that the results are in agreement with our own theoretical considerations. This involved checking for possible geometric isomers and conformers and creating models which could explain their properties. We are well aware that such calculations have limitations and there is no model, which is 100% accurate, so our results should be further confirmed by empirical data in the future. However, we still to be as thorough as possible in how we approached the subject.

2

Solvent-buffer effects in molecular dynamics simulations of nucleic acids

Baghel, N.; Shrivastava, P.; Mehra, R.

2026-07-06 biophysics 10.64898/2026.07.05.736650 medRxiv

Top 0.1%

3.6%

Show abstract

Molecular dynamics simulations of nucleic acids are performed using a solvent-buffer distance of 10 [A] between the solute surface and the simulation box boundary. Although this cell size has been extensively explored in protein simulations, its implications for nucleic acid dynamics are not well understood. Nucleic acids are elongated, highly charged, and flexible structures with hydration and dynamical properties distinct from those of proteins and therefore, they may require different solvent-layer considerations in simulations. In this study, we investigated the effect of simulation cell size on nucleic acid dynamics by simulating a 30-base-pair double-helical nucleic acid structure and its two single-stranded forms using solvent-buffer distances of 3, 5, 10, 15, and 20 [A]. Smaller cells may impose restricted hydration, molecular crowding, and periodic image interactions. However, larger cells provide solvent space for conformational relaxation. A total of 45 s of molecular dynamics simulations were performed (3 structures x 5 cell sizes x 3 replicates x 1 s). Our results show that while the commonly used 10 [A] buffer may be sufficient to maintain the stability of the double-stranded nucleic acid, larger cells are required to capture the conformational dynamics of single-stranded structures. In both, increasing the cell size to 15 or 20 [A] enables broader conformational sampling. The first hydration shell exhibits reduced crowding in the 20 [A] cell, consistent with more relaxed conformations. At larger cell sizes, single-stranded nucleic acids adopt compact, self-associated conformations for stability. Together, this study presents physical insight into how simulation cell size and solvent environment influence nucleic acid dynamics.

3

Protein hydration and druggability

Panasenko, S.; Khorev, V.; Petukhov, M.

2026-07-08 biophysics 10.64898/2026.07.06.736750 medRxiv

Top 0.1%

2.1%

Show abstract

A priori assessment of target proteins' druggability remains an unsolved problem in the field of drug development. The empirical approaches widely used to solve this problem demonstrate low efficiency. In this work, we investigated the factor of hydration of a representative set of 65 evolutionarily and structurally unrelated human enzymes in a water environment. This factor depends only on the structure of the proteins, and not on the physical and chemical properties of any potential ligands. The results show that, unlike the widely used approaches based on calculations of the accessible surface area (ASA), the content of low-entropy water molecules (LEW) in the active sites of human enzymes is systematically higher than that in other areas of their surface, including inactive cavities. Optimal criteria and a step-by-step procedure for identifying protein ligand binding sites are proposed. The proposed approach, based on the calculation of the LEW content in the first hydration layer of potentially interesting target proteins, makes it possible to evaluate their medicinal suitability even before the development of any ligands. The article also presents the results of a comparative analysis of experimental Raman spectroscopy data and the results of molecular dynamics simulations of water hydrogen bonds using three widely used water models (TIP3P, OPC3, and TIP5P) and standard algorithms for calculating hydrogen bond networks.

4

Collinearity of Decomposed Energy Terms in MM-GBSA Binding Free Energy Calculations

Sevim, A.; Kocak, A.

2026-06-29 biophysics 10.64898/2026.06.24.734195 medRxiv

Top 0.1%

1.9%

Show abstract

The molecular mechanics-generalized Born surface area method (MMGBSA) is one of the most commonly used end state approaches used for the calculation of the binding free energy towards computational drug design and screening studies. It is customary to break up the free energy into van der Waals, electrostatic, polar solvation (GB), and nonpolar solvation (SA) terms and then either correlate these terms with experiment or assign physical meaning to each term. Here, we demonstrate that this assumption of independent fitting coefficients for decomposed energy terms could be invalid. Through analytic derivation and large-scale molecular dynamics simulations, we show that (i) the protein and ligand Coulomb interaction energy and the GB solvation correction are almost perfectly collinear (R2[≥]0.99) reflecting their designed role as vacuum electrostatics plus solvent screening, and (ii) the van der Waals interaction and SA term likewise exhibit strong correlation, as both depend primarily on buried surface area. Interaction entropy and C2 entropy corrections are also found to be strongly dependent on underlying electrostatic fluctuations, further reinforcing redundancy. These findings hold both at the level of instantaneous trajectory fluctuations and when averaged across a diverse set of 139 protein-protein complexes and persist in both single-trajectory and three trajectory MMGBSA protocols. Our results caution against using decomposed MMGBSA terms as independent predictors in regression models and suggest instead combining correlated terms into effective polar, nonpolar, and entropic contributions. Our study provides a systematic diagnosis of collinearity in MMGBSA and highlights pathways toward more interpretable and statistically robust predictive modeling.

5

Homology-aware cross-validation strategies for generalization assessment in RNA structure prediction

Bugnon, L.; Kulemeyer, G.; Gerard, M.; Di Persia, L.; Stegmayer, G.; Milone, D. H.

2026-06-29 bioinformatics 10.64898/2026.06.28.735057 medRxiv

Top 0.1%

1.7%

Show abstract

RNA secondary structure prediction is a fundamental challenge in bioinformatics, essential for understanding the functional roles of non-coding RNAs. Recently, deep learning models have transformed the field with impressive results, leading to critical discussions regarding the validity of current cross-validation strategies. On the one hand, traditional random partitioning yields overop-timistic results due to data leakage from uncontrolled homology. On the other hand, removing from the training set all sequences that exhibit even the slightest resemblance to the testing sequences penalizes learning-based methods by requiring generalization to completely out-of-distribution sequences. While it is very simple to remove sequences and retrain a machine learned model, it is very difficult to remove the experimental data used for parameter tuning and the sequences used for the development of classical thermodynamic methods. Thus, these methods often benefit from an implicit knowledge leakage. In this work we critically review existing cross-validation strategies for RNA secondary structure prediction: random splitting, clustering-based splitting, and leaving one RNA family out for testing. We analyze the advantages and limitations of each strategy, also expanding them towards the future directions to ensure fair comparisons across the full range of sequence similarities, with the same rigor for both classical and learning-based methods.

6

BioMetAll v2.0: Introducing Scores, Metal Discrimination, and Side-Chain Descriptors for Predicting Metal-Binding Sites in Proteins.

Marechal, J. D.; Fernandez Diaz, R.; Pena Losada, R.; Sanchez Aparicio, J. E.; Gao, W.; Alemany, M.

2026-07-12 bioinformatics 10.64898/2026.07.09.737562 medRxiv

Top 0.1%

1.7%

Show abstract

Predicting the location of metal-binding sites in proteins is crucial for fundamental biological questions and biotechnological applications. Over the past decade, the rise in metal-bound protein structures in the Protein Data Bank, combined with advanced statistical models such as deep learning, has accelerated the development of metal-binding site prediction tools. Several approaches are now available, offering high-quality benchmarks and predictive performance. Our initial development in this area is BioMetAll, whose first version was based on backbone pre-organization. Here, we introduce its second version, featuring two major updates: 1) metal-specific scoring functions and 2) prediction using backbone geometry alone or in combination with first coordination sphere descriptors. Apart from demonstrating metal sensitivity and yielding better benchmarking results, this new version allows the assessment of the influence of considering the metals first coordination sphere versus backbone pre-organization on how metallic species bind to proteins.

7

Capabilities, specificity gaps and training-data dependence of AlphaFold3 across diverse application areas

Follonier, O.; Liu, Y.; Campomanes, P.; Lafrenaye, L.; Racle, J.; Alvarez, D.; van Gerwen, J.; Heinzmann, R.; Jänes, J.; Kummelstedt, E.; Durairaj, J.; Gfeller, D.; Vanni, S.; Beltrao, P.

2026-07-13 bioinformatics 10.64898/2026.07.13.738147 medRxiv

Top 0.1%

1.7%

Show abstract

Structure prediction models have moved from single proteins to assemblies that include diverse biomolecules and their modifications. AlphaFold3 (AF3) and related models extended structural modelling via an all-atom framework, opening many new potential applications in structural biology. We evaluate how well the new capabilities of AF3 translate into application tasks in diverse areas: prediction of ubiquitinated protein structures, T-cell receptor (TCR)-epitope recognition, antibody-antigen complexes, protein-RNA and protein-lipid interactions. We find that, while AF3 can perform well in favourable settings, this performance is uneven across applications. In RNA-target predictions, the model confidence fails to separate genuine from decoy interaction partners and in several tasks accuracy depends on the presence of related complexes in the training set. Taken together, our assessment is more cautious than for AF2, whose gains in modelling monomers and complexes were clear and broadly generalisable. AF3s extension to new biomolecule types shows less consistent performance and generalisation. AF3 can be a powerful tool for hypothesis generation and prioritisation, but its predictions and use of confidence metrics will depend strongly on the specific application area and must be interpreted with respect to training-set overlap. We expect that the benchmarks provided here will serve for testing of future developments in the structure prediction field.

8

A Requirement for K+ Ion Dehydration Governs Gating of the Shaker K+ Channel: Quantum Calculations Show Complex Interactions of Ions, Water, Protons, and Protein Side Chains

Kariev, A. M.; Monaco, R. R.; Green, M. E.

2026-07-07 biophysics 10.64898/2026.07.01.735716 medRxiv

Top 0.1%

1.5%

Show abstract

There is a vast literature on the voltage gating of ion channels, with a fairly large fraction concerned with potassium channels, especially of the KV1 family, including Shaker. Experimental evidence derived from protein structure has been interpreted to give gating mechanisms that largely disregard water. We propose that the K+ ion, in order to pass through the gating region and enter the cavity pore, must be largely dehydrated. Competitive interactions of each single hydration shell water at the gate, with counterions, protein, or other water molecules, can remove one water at a time. There are several such interactions for the ion hydration shell; for the ion to pass through the gating region, there must be enough such interactions to leave the ion with at most two hydrating water molecules, in which case the gate is open. Protein conformational changes are secondary, small, and mostly unimportant. The hypothesis has a second part: protons, previously shown to be candidate carriers of the gating current (Kariev and Green, JPC B, 2019, Membranes, 2022, 2024) are capable of reaching the gate; adding four protons to the gate prevents dehydration, leaving the ion with at least three hydrating water molecules, enough to block passage. Quantum calculations presented here support the dehydration part of the hypothesis; they also mostly support the second part, concerning the protons, but further work will be required to fully confirm this. The hypothesis explains the experimental finding that the P475D mutant is essentially constitutively open, while the P475S mutant, with a wider gate opening, is closed at all relevant potentials; the computations presented here show the mechanism for this in detail, further confirming the first part of the hypothesis, and largely but not completely confirming the second part, concerning protons, while showing where further work is needed. This mechanism can also qualitatively account for flicker noise and fluctuations, and their consequences.

9

The Gompertz curve for estimating growth rates of Protein Data Bank and protein folds

Sato, K.; TOMII, K.

2026-06-26 bioinformatics 10.64898/2026.06.24.732253 medRxiv

Top 0.2%

1.1%

Show abstract

The Protein Data Bank (PDB) is an ever-growing, open-access repository of structural data of biological molecules. This international database has been instrumental in the development of artificial intelligence and deep learning models for protein structure prediction and design. The PDB growth is a crucially important factor influencing further development of these models. Therefore, after analyzing the growth trend in PDB depositions since the archive's launch, we found that it is well fitted by the Gompertz function, a growth curve used across various disciplines. Furthermore, we observed that the function captures the "discovery of novel folds", i.e., the cumulative number of distinct folds among protein domains that constitute most of the PDB. Consequently, based on the fitting results, we estimated the likely numbers of PDB entries and protein folds. These findings provide insights into deceleration of growth in recent years and enable us to assess anticipated trends.

10

AptBacterialDB: A Comprehensive, Manually Curated Database of Antibacterial Aptamers

Bajiya, N.; Gupta, I.; Raghava, G. P. S.

2026-07-03 bioinformatics 10.64898/2026.07.01.735956 medRxiv

Top 0.3%

0.6%

Show abstract

In recent years, aptamers have transitioned from mere laboratory tools to highly potent molecular recognition agents capable of overcoming the strict limitations of conventional antibiotic therapies. We have developed AptBacterialDB, a manually curated, large, comprehensive database of experimentally validated antibacterial aptamers spanning 1996 to 2026. The database contains a total of 2131 aptamers targeting approx 75 different bacterial classes, and 124 aptamer targets with 95 entries found in UTexas databases, 97 in AptaDB, and 28 in Aptabase. It contains 1555 unique aptamer sequences, 189 unique modifications, 40 different selection approaches, and 44 different affinity methods. It integrates detailed annotations of about 20 fields, including sequence information, nucleic acid type, binding affinity, modifications, experimental and functional details. The secondary structure of the aptamers was predicted using ViennaRNA Package 2.0, demonstrating that they adopt mostly stable conformations, with a structured stem region. MySQL was implemented for database development, and a knowledge graph was integrated using ArcadeDB/openCypher for graphical visualization of aptamer-target-organisation relationships. Facilities such as different search modes, browsing, similarity search, REST API access, and entries linked to the existing database for a broader view of the aptamers have been provided. AptBacterialDB (https://webs.iiitd.edu.in/raghava/aptbacterialdb/) provides a user-friendly centralized platform to accelerate antibacterial aptamer research, therapeutic development, biosensor design, and computational modelling efforts.

11

PKProbDesign: RNA inverse folding including pseudoknots by optimizing thermodynamic folding probability

Otagaki, T.; Iwakiri, J.; terai, g.; Asai, K.; Sato, K.

2026-07-11 bioinformatics 10.64898/2026.07.09.736945 medRxiv

Top 0.3%

0.6%

Show abstract

MotivationRNA inverse folding, the design of RNA sequences that fold into specified target structures, is a central problem in RNA design, with applications in functional RNA engineering, synthetic biology, and nucleic-acid therapeutics. This task becomes especially challenging for pseudoknotted target structures because pseudoknots disrupt the nested structure assumed by standard thermodynamic folding models. Existing pseudoknot inverse-folding methods often rely on structure-predictor-based objectives. Direct optimization of the thermodynamic folding probability of a specified pseudoknotted target remains limited. This requires an evaluator that can assign target-specific folding probabilities within a pseudoknot-aware ensemble and can be used as an optimization signal. ResultsWe present PKProbDesign, a sampling-based inverse-folding framework that directly optimizes a thermodynamic folding-probability objective for pseudoknotted targets. For each target, candidate sequences are scored by combining the folding probability of a pseudoknot-free scaffold with the conditional folding probability of the remaining extension component. On 354 PseudoBase++ targets, PKProbDesign achieved the highest folding probability on 221 targets, compared with 117 for DesiRNA and 16 for MODENA. ConclusionsPKProbDesign demonstrates that pseudoknot inverse folding can be formulated around target folding probabilities rather than structure-prediction agreement alone. By combining scaffold decomposition with HFold/CParty-consistent conditional-ensemble evaluation, the method provides a practical probability-based framework for designing sequences for density-2 pseudoknotted targets. AvailabilityThe source code of PKProbDesign is available at https://github.com/TakumiOtagaki/PKProbDesign.

12

AptCancerDB: A Curated Knowledgebase and Translational Discovery Platform for Anticancer Aptamers

Bajiya, N.; Singh, S.; Raghava, G. P. S.

2026-07-09 cancer biology 10.64898/2026.07.02.735999 medRxiv

Top 0.4%

0.5%

Show abstract

Aptamers are emerging as important molecular recognition ligands in oncology, playing significant roles in cancer diagnostics, targeted therapies, drug delivery systems, and molecular imaging. Numerous aptamers have advanced to clinical trials, indicating their potential for real-world applications; however, existing databases fail to capture that. To bridge this critical gap, we developed AptCancerDB (https://webs.iiitd.edu.in/raghava/aptcancerdb/), a comprehensive, manually curated database of experimentally verified anticancer aptamers. The current release contains 1,941 entries collected from studies published between 2000 and 2025, covering 29 cancer types, approximately 200 cancer cell lines, and direct links to 22 clinical trials. Each entry is annotated with sequence information, target details, cancer type, cell line, SELEX methodology, affinity determination data, chemical modifications, and biological activities. The dataset is dominated by 82.7% ssDNA, reflecting its superior stability and ease of synthesis, while only 16.6% is ssRNA and appears primarily in studies targeting complex intracellular or protein-protein interactions. To facilitate structural analysis, predicted secondary structures, dot-bracket notations, specific structural elements, and minimum free energy values were also included. AptCancerDB integrates a MySQL backend with an ArcadeDB/OpenCypher-based Knowledge Graph, enabling exploration of relationships among aptamers, targets, cancer types, cell lines, and functional applications. The platform provides advanced search and browsing facilities, BLASTn-based similarity searching, and GC Calculator. Built on a modern, responsive frontend (React/TypeScript/Tailwind CSS), the platform includes a REST API for data retrieval. By integrating fragmented experimental data into a unified cancer-focused resource, AptCancerDB serves as a valuable resource for comparative analysis, aptamer discovery, and the development of next-generation aptamer-based diagnostics and therapeutics. HighlightsO_LICurated knowledge base of experimentally validated anticancer aptamers. C_LIO_LIAptCancerDB contain therapeutic, tumor-homing and cell-penetrating aptamers. C_LIO_LISummarizes clinical progress and translational trends in anticancer aptamer research. C_LIO_LISupports rational aptamer design using molecular, functional, and clinical annotations C_LIO_LIDisease-focused resource for cancer diagnosis, therapy, and drug delivery C_LI TeaserAptCancerDB maintains experimentally validated anticancer aptamers relevant to diagnosis, drug delivery, and therapy.

13

A generalisable framework to inject distance information into Alphafold-like structure predictors

Mirabello, C.; Wallner, B.; Orekhov, V.; Nystedt, B.; Pearce, N.

2026-07-06 bioinformatics 10.64898/2026.07.02.736010 medRxiv

Top 0.4%

0.5%

Show abstract

Structure prediction methods are now highly successful at predicting three-dimensional structures from sequence. However, it is still often desirable to supplement these methods with additional external priors on pairwise distances in the structures. We present a general method for injecting prior information into AlphaFold-like structure predictors by biasing the pair representation to produce desirable features in the distogram, which are then reflected in the structures. We demonstrate this approach to: sample alternate states by selectively pushing or pulling mobile amino acid pairs; integrate NMR NOESY data with structure pre-diction; and improve the success of protein-protein and protein-ligand complex prediction. We demonstrate that this approach is applicable both to AlphaFold2 and a reproduction of AlphaFold 3 (OpenFold3). resTrain is open source, available to all users on GitHub and as a Colab notebook: https://github.com/clami66/resTrain

14

ComplexDesign: sequence-hallucination design of protein binders bridging multiple proteins

Xu, J.; Ren, M.; Qi, N.; Zhang, X.; He, Z.; Yu, C.; Bu, D.

2026-06-24 bioinformatics 10.64898/2026.06.21.733655 medRxiv

Top 0.4%

0.5%

Show abstract

MotivationDesigning multichain protein complexes requires coordinating the folding of component proteins with the formation of their interfaces. The existing methods, however, remain limited in their ability to satisfy these requirements simultaneously, especially for trimeric and tetrameric complexes. As an important practical scenario, designing a binder that bridges two target proteins into a ternary complex requires flexibility in the relative arrangement of the two targets, adding an additional challenge to existing design methods. ResultsWe present ComplexDesign, a hallucination-based approach for multichain protein design. ComplexDesign performs structure-prediction-guided sequence optimization to simultaneously fold each protein chain and form inter-chain interactions that bind them together. To provide the flexibility required to appropriately arrange these target proteins, ComplexDesign introduces a specialized masking mechanism that enables exploration of possible relative arrangements rather than being limited to the predefined ones. Across a comprehensive set of benchmarks with various chain lengths, ComplexDesign outperformed existing methods in the unconditional design of dimers, trimers, and tetramers, achieving a high design success rate exceeding 50%, supporting its capability for multichain complex design. Furthermore, in the case of multi-target binder design, ComplexDesign produced high-confidence, self-consistent ternary complexes for 8 out of 10 target pairs. These results establish ComplexDesign as an effective tool for multichain protein design, with particular utility for designing binders that bridge two target proteins. Availability and implementationThe source code of ComplexDesign will be made publicly available upon publication.

15

Structural Topology-based Electrostatic Model (STEM) Reveals Ion-Coordination Exchange as a Driver of RNA Folding Dynamics

Mainan, A.; Jaiswar, A.; Onuchic, J. N.; Sanbonmatsu, K. Y.; Roy, S.

2026-07-01 biophysics 10.64898/2026.06.27.734987 medRxiv

Top 0.5%

0.4%

Show abstract

RNA is a highly charged polyelectrolyte whose folding into functional architectures depends on an ionic atmosphere that screens strong electrostatic repulsion along the phosphate backbone. Whereas monovalent ions primarily stabilize secondary structure, divalent magnesium (Mg2+) drives tertiary folding often via site-specific and adopting various dynamic coordination modes. Current RNA structure-prediction frameworks rely largely on static direct-contact information, overlooking ion-mediated interactions and the dynamic exchange between distinct coordination modes-particularly the dynamic exchange between direct (inner) and solvent-separated (outer-sphere) Mg2+-phosphate coordination that often controls RNA's conformational transition. Here, we introduce the Structural-based Electrostatic Model (STEM), a hybrid implicit-explicit framework that explicitly captures how the dynamic exchange between distinct ion-coordination modes dictates folding pathways. STEM combines explicit Mg2+ ions to resolve site-specific interactions with implicit K+ ions to describe counter-ion condensation mediated electrostatic screening through generalized Manning counter-ion condensation model, enabling computationally efficient exploration of RNA folding landscapes. The model accurately reproduces crystallographic ion-binding sites, experimental preferential ion-interaction coefficients, and Small-Angle X-ray Scattering (SAXS)-derived radii of gyration across diverse RNA systems. Applied to a 58-nt rRNA fragment, STEM reveals that folding from an intermediate to the native state is driven by a chelated Mg2+-mediated tertiary contact and captures the resulting coordination-dependent conformational breathing. By shifting the paradigm from static direct-contact descriptions to ion-mediated dynamic interactions, STEM provides a physically grounded framework for predicting dynamic ensembles of RNA structures, resolving their folding free-energy landscapes, and elucidating the mechanisms of RNA folding and function beyond native conformations across physiological salt conditions.

16

Molecular crowding: impacts on the activity of the 10-23 DNAzyme

Kirchgaessler, N.; Rosenbach, H.; Biehl, R.; Steger, G.; Boerner, R.; Span, I.

2026-07-01 biophysics 10.64898/2026.06.30.735450 medRxiv

Top 0.5%

0.4%

Show abstract

The growing number of approved nucleic acid therapeutics illustrates the potential to treat diseases by targeting their genetic blueprints in vivo. The 10-23 DNAzyme is capable of cleaving a wide range of target RNA with high selectivity. However, its poor performance in vivo restricts its therapeutic application as gene silencing agent. Studies on ribozymes have shown that the crowded environment in cells and associated effects can impact ribozyme folding and thermostability, resulting in a change in activity. This opens up the question whether DNAzymes are also affected by molecular crowding. Here, we investigate the functional and structural influence of molecular crowding conditions on the 10-23 DNAzyme. The stability and activity of a PrP-specific 10-23 DNAzyme were examined in presence of PEG, dextran, and osmolytes. Our results indicate that osmolytes decrease DNAzyme activity in a concentration-dependent manner, while certain PEG and dextran concentrations promote activity. To rationalize our observations, we studied the cosolutes effect on physicochemical solution properties and the structure of the DNAzyme:RNA complex using FCS and SAXS. The data reveal that enhanced activity is observed under conditions where a combination of physiochemical properties matches an optimum that seems to be dependent on the metal ion cofactor. Structural influence under such conditions is indicated less. We propose that a certain degree of molecular crowding is required to favor a state, which allows for higher catalytic turnover. In addition, we show that the requirement for magnesium and manganese as a cofactor remains unchanged under the conditions applied. Our work contributes to a better understanding of how the cellular environment affects DNAzyme structure and function.

17

Generative continuous time model reveals epistatic signatures in protein evolution

Pagnani, A.; Barrat-Charlaix, P.

2026-07-10 bioinformatics 10.1101/2025.09.17.676821 medRxiv

Top 0.5%

0.4%

Show abstract

Protein evolution is fundamentally shaped by epistasis, where the effect of a mutation depends on the sequence context. As standard phylogenetic methods assume independently evolving sites, there is a need for more complex models based on accurate estimations of the fitness landscape. Good candidates are modern generative models -- such as the Potts model -- which successfully capture epistatic effects. However, recent work on generative evolutionary models usually use discrete time, making them difficult to integrate with the standard frameworks in evolutionary biology. We introduce a continuous-time sequence evolution model using the Gillespie algorithm and parameterized by a generative Potts model. This approach enables us to simulate realistic, family-specific evolutionary trajectories and allows for direct comparison with independent-site models. Surprisingly, we find that while epistasis significantly slows down evolution, it does not change the average evolutionary rates at individual sites. This is explained by the rate heterogeneity caused by context-dependence: we show that the rate at some positions varies between null to high values depending on the context, while other positions are essentially independent from the context. Finally, we show that epistasis leads to a systematic underestimation bias in the inference of evolutionary distance between sequences. Overall, our work provides a new tool for simulating realistic protein evolution and offers novel insights into the complex interplay between epistasis and evolutionary dynamics.

18

Solvation Shapes the Conformational Landscape of a Therapeutically Relevant SMN2 Splice-Site Defect

Khaled, M.; Leuschner, L.; Palomino/Hernandez, O.

2026-07-06 biophysics 10.64898/2026.07.01.735918 medRxiv

Top 0.6%

0.4%

Show abstract

The SMN2 exon 7 5' splice-site/U1 snRNA duplex contains an A$_{-1}$ bulge that weakens splice-site recognition and represents a therapeutically relevant RNA connectivity defect, yet its conformational landscape and coupling to solvation remain poorly understood. Here, we performed enhanced-sampling Hamiltonian replica-exchange molecular dynamics simulations of the SMN2 splice-site duplex using four explicit-solvent models (OPC, TIP4P-Ew, TIP3P, and SPC/E) and characterized the sampled ensemble using linear and machine-learned latent representations. Across representations, the A$_{-1}$ defect consistently populated three metastable conformational states distinguished by local duplex geometry, base stacking, hydrogen-bonding patterns, and solvent exposure. The relative populations of these states, together with first-shell hydration and Na$^+$ distributions around the defect, varied substantially across water models, demonstrating that hydration and ion organization actively shape the equilibrium between locally accommodated and solvent-exposed conformations of the SMN2 splice-site bulge. Our results shed light on the conformational components of this therapeutic RNA target and highlight the impact of solvation model as an important consideration for molecular simulations of RNA splice-site recognition and small-molecule repair.

19

EnzyKAN: Protein Language Model Embeddings and Kolmogorov-Arnold Network Variants for Enzyme Commission Classification with a Proposed Electron-Transfer Physics Feature Framework

R, S.; Reddy, B. R. R.

2026-06-29 bioinformatics 10.64898/2026.06.23.734004 medRxiv

Top 0.6%

0.4%

Show abstract

MotivationComputational enzyme classification has previously utilised sequence homology features and protein language model embeddings. The Kolmogorov-Arnold Network (KAN) paradigm, which uses learnable edge functions rather than fixed ones, has shown promising results in biological sequence tasks. ResultsA fully reproducible investigation of KAN variants for seven-class EC classification on up to 9,516 labelled sequences from the CLEAN benchmark [1] (9,386 for language model experiments). In the sequence only settings, fixed basis KAN variants outperformed an MLP baseline moderately (macro F1 = 0.17-0.29). Utilisation of ESM-2 650M embeddings [2] greatly improved results via 5-fold cross-validation: MLP macro F1 = 0.750 {+/-} 0.009, accuracy = 0.823 {+/-} 0.009; learnable SineKAN macro F1 = 0.716 {+/-} 0.023, accuracy = 0.788 {+/-} 0.019. MLP performed comparably but did not exceed conventional baselines. As an aside, we introduce but do not investigate an approach to EC oxidoreductase sub-classification through the use of a Marcus theory-based electron transfer feature framework. AvailabilityCode and result files are available at https://github.com/sanjuz-cas/ENZYKAN.

20

Accurate ΔTm Prediction Without Protein Structure Inputs for Biomolecular Stability

Siegismund, D.; Wieser, M.; Natali, E.; Steigele, S.

2026-07-06 bioinformatics 10.64898/2026.07.02.735991 medRxiv

Top 0.6%

0.3%

Show abstract

Predicting protein stability, like changes in melting temperature ({Delta}Tm) caused by mutations, is a critical task in therapeutic protein engineering and drug discovery. This is reflected by a growing solution space, including both AI-based sequence and structure based methods. This paper demonstrates that accurate {Delta}Tm prediction does not require structural input features, but can achieve state-of-the-art results with a careful training design for large sequence-based protein language models. We combine an autoresearch-inspired setup search with controlled ablation studies and show that a well-tuned sequence-only ESM2-650M model outperforms structure-informed methods in our benchmark, achieving the lowest error (MAE/RMSE) and competitive Pearson correlation without pH or structural inputs. We further show that choices such as loss function, pooling strategy, auxiliary supervision, and fine-tuning regime materially affect performance.